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1 Introduction 



Progress on the determination of the parton distribution functions (PDF's) of the nucleon, from 
global quantum chromodynamics (QCD) analysis of hard scattering processes, is central to precision 
standard model (SM) phenomenology, as well as to new physics searches, at lepton-hadron and 
hadron-hadron colliders. There have been many new developments in recent years, beyond the 
conventional analyses that underlie the widely used PDF's [Q-^. These developments have been 
driven by the need to quantify the uncertainties of the PDF's and their physical predictions 0-12 



We report in this paper on a comprehensive new global QCD analysis based on the most current 
data, and on the recently developed methods of uncertainty study of [10-12|. This new analysis 
includes a full treatment of all available correlated experimental errors, as well as an extensive 
exploration of the parametrization of the input nonperturbative PDF's. 

Although this work is built on the series of previous CTEQ parton distributions Q, it repre- 
sents more than an evolutionary updating of previous work to incorporate new experimental data 
sets. The methodology of [10-12| goes beyond the traditional paradigm of producing some sub- 
jectively chosen "best fits." It introduces a set of efficient and practical tools to characterize the 
parton parameter space in the neighborhood of the global minimum. This allows the systematic 
exploration of the uncertainties of parton distributions and their physical predictions due to known 
experimental errors and due to the input theoretical model parameters. 

There are many complex issues involved in a comprehensive global parton distribution anal- 
ysis. Foremost among these on the experimental side is the "imperfection" of real experimental 
data compared to textbook behavior — for instance, some experimental measurements appear to 
be statistically improbable because the / N deviates from 1 substantially more than the ex- 
pected ±^y2/N■, or different precision experimental measurements of the same physical quantities 
appear to be statistically incompatible in all regions of the model parameter space. The methods 
of ||lO|-[T^ cannot resolve these problems — no global analysis method can — but the tools developed 
in this formalism make it possible to look deeper into some of these problems, in order to assess 
the acceptability and compatibility of the affected data sets in more practical terms, and to suggest 
pragmatic ways to deal with the apparent difficulties. These detailed studies were not possible in 
the conventional analyses [^|-j3| and |13]. On the theoretical side, the uncertainties on the pertur- 
bative QCD (PQCD) calculations of the various physical processes included in the global analysis 
are not easily quantified in a uniform way. We do not address that problem here. 

Section § summarizes the experimental and theoretical input to the global analysis, empha- 
sizing the new elements in data sets and in methodology. Section |3| presents results on the new 
generation of PDF's: the "standard" CTEQ6 sets, as well as the eigenvector sets that characterize 
the uncertainties. It also includes discussions of various physics issues relating to these global fits. 
Section ^ presents some general results on uncertainties of physics predictions due to PDF's, in 
the form of parton-parton luminosity functions for the Tevatron Runll and for the LHC. Section ^ 
compares this work to previous studies of PDF uncertainties [Q-Q (primarily on DIS experiments), 
and to a recent general PDF analysis [13|. Four appendices concern: (^) detailed information on 
the new standard parton distribution set CTEQ6M; (P) new formulas and tools for understanding 
the significance of values for individual experiments and estimating ranges of uncertainties in 
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the comparison of fits with data; (j^) assessing the need for higher-twist (power-law) contributions 
to the theory model; and (^) studying the impact of the flexibility associated with the choice of 
parametrization for the nonperturbative PDF's. 



2 New Input and Methodology 

The next-to-leading order (NLO) global QCD analysis carried out in this work is built on the 
same basis as the previous CTEQ parton distribution sets |2|. In this section, we describe the new 
experimental input and the new theoretical techniques, which together have enabled substantial 
progress in this generation of global analysis. 



2.1 Experimental Data Sets 



Since the CTEQ5 analysis, many new experimental data sets have become available for an improved 
determination of parton distributions. Particularly noteworthy are the recent neutral current deep- 
inelastic scattering (DIS) structure function measurements of HI [14| and ZEUS and the 
inclusive jet cross section measurement of D0 [|16| (in several rapidity bins, up to a rapidity of 
3). The greater precision and expanded (x, Q) ranges compared to previous data in both processes 
provide improved constraints on the parton distributions. Other recent data used in the analysis are 



the updated E866 measurements of the Drell-Yan deuteron/proton ratio |17| and the re- analyzed 
CCFR measurement of i<2 [18|. 

These new results complement the fixed-target DIS experiments of BCDMS [^,|2^, NMC 
|2l|, CCFR Fs the Drell-Yan measurement of E605 [||], the CDF measurement of VF-lepton 
asymmetry |24], and the CDF measurement of inclusive jets which this study shares with the 
earlier CTEQ5 analysis. Even for these older experiments, our new analysis, by including correlated 
systematic errors (cf. Sec. 2.2 and Appendix B.l), incorporates more details than previous 
and recent analyses. For instance, for the BCDMS and NMC experiments we now use the data 
sets measured at separate energies (which contain full information on correlated errors) instead 
of the combined data sets (which, due to re-binning of data points, retain only some effective 
point-to-point uncorrelated errors of uncertain statistical significance). 



2.2 Function and Treatment of Correlated Systematic Errors 

The very extensive and precise DIS data from fixed-target and HERA experiments provide the 
backbone of parton distribution analysis. In order to make full use of the experimental constraints, 
it is important to incorporate the available information on correlated systematic errors P-|7| p!0|- 



12|. The same is true for recent data on inclusive jets [16, 25|, where the experimental uncertainties 
are dominated by systematics. Thus, for the first time in the CTEQ series of analyses, we have 
included the correlated errors wherever available. Our analysis is simplified by a novel way of 
treating systematic errors, formulated in (cf. also the recent review |2^). As an introduction 
to subsequent discussions based on this method, we mention its basic ideas and main features here. 



A more complete summary of this method is given in Appendix B.l, as needed to explain the 
detailed comparison between data and fits discussed in that section. 
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In global fits not including correlated errors, one would minimize a naive global function 
defined simply as Xo = Sexpt 'l2i^=i i^i l^'i-> where A'^e is the number of data points in 

experiment e, Di is a data value, Tj is the corresponding theory value (which depends on the PDF 
model), and cr-^ = o"? + is the statistical {pi) and point-to-point systematic (S^) errors added in 
quadrature. The function Xg provides the simplest means to search for candidates of "good" global 
fits, but has rather limited use in assessing uncertainties of the resulting fits. 

If there are K sources of correlated systematic errors, specified by standard deviations 
{Pit, P2i, ■ ■ ■ , Pki}, in addition to an uncorrelated systematic error Ui for data point i, then a 
standard method to improve the treatment of experimental errors is to construct the covari- 
ance matrix Vij = af6ij + Ylk=i f^kiPkj, from which a global function is defined by x^ = 
Eexpt. Eij=i (A - Ti) Vr:^ {Dj - Tj). (Here af = af + uf.) An alternative method is to add 
K parameters, each associated with one systematic error, and to minimize an extended x^ function 
with respect to the combined set of experimental and theoretical parameters (denoted by x'^ in 
Appendix B.l, Eq. (|^). These well-known methods are equivalent. Both face some practical, even 



formidable, problems in the context of global QCD analysis because of the large number of data 
points (which can make the inversion of the covariance matrix numerically unstable) and the large 
number of fitting parameters (which becomes unmanageable when all systematic errors from all 
experiments are included). 



The method formulated in |12| overcomes these difficulties by solving the problem of opti- 
mization with respect to the correlated systematic errors analytically. The result is an effective 
X^ minimization problem with respect to only the theory parameters, as for the simple case. The 



resulting x^ function has the form (cf. Eq. (10)): 




- Bk{A-'),,,B,,\, (1) 

expt. I i=l k,k'=l 



where {Bf.} is a A'e-component vector, {^fcfc'} is a x Kf. matrix, and 

„ ^ Pki {Di - Ti) PkiPk'i fr,^ 

-Dfc — / ^ 2 ' ^^kk' — <Jkk' + / ^ 2 ' V^i 

1=1 ' 1=1 ' 

Not only is the matrix inversion simpler and more stable (/Cg x -^e with Kg, < 10, compared to 
A'^e X -^e with A'^e SiS large as 300, in the correlation matrix approach), the explicit formula for 
the contribution of the various sources of systematic errors in Eq. (Q) provides a useful tool to 



evaluate the uncertainties in detailed analysis of the fits — as explained in Appendix B.l and used 



in discussions in subsequent sections — which is not available in the traditional approach. 
2.3 Methods for Analysis of Uncertainties 



The widely used parton distributions |l|-|3, 13 1 are obtained from global analyses using a "best- 
fit" paradigm, which selects the global minimum of the chosen function. However, the "best" 
PDF's are subject to various uncertainties. Within this paradigm, the problem of estimating 
uncertainties of the PDF's can only be addressed in an ad hoc manner, usually by examining 
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alternative fits obtained by subjective tuning of selected degrees of freedom. Recent efforts to assess 
the uncertainties objectively, using established statistical methods, have been mostly concerned with 
the precision DIS experimental data Q-^, rather than the global analysis of all hard scattering data. 
As mentioned in the introduction, there are formidable complications when standard statistical 
methods are applied to global QCD analysis. The basic problem is that a large body of data from 
many diverse experiments]^ which are not necessarily compatible in a strict statistical sense, is being 
compared to a theoretical model with many parameters, which has its own inherent theoretical 
uncertainties. 

In recent papers ||To|-p!^, we have formulated two methods, the Hessian and the Lagrange, 
which overcome a number of long-standing technical problems encountered in applying standard 
error analysis to the complex global analysis problem. We are now able to characterize the behavior 
of the function in the neighborhood of the global minimum in a reliable way. This provides a 
systematic method to assess the compatibility of the data sets in the framework of the theoretical 





model |27|, and to estimate the uncertainties of the PDF's and their physical predictions within 
a certain practical tolerance. The basic ideas are summarized in the accompanying illustration, 
adapted from [ pT|| : 

2-dim ( i,j) rendition ofd-dim ( ~20) PDF parameter space 

contours of constant X^gi^i^ai 
uf. eigenvector in the l-direction 
p((): point of largest a,- with tolerance T 
Sq. global minimum 

diagonalization and 

rescaling by ^ 
the iterative method 

• Hessian eigenvector basis sets 

(b) 

Original parameter basis Orthonormal eigenvector basis 

The behavior of the global function in the neighborhood of the minimum in the PDF parameter 
space is encapsulated in 2Np sets of eigenvector PDF's (where Np ~ 20 is the number of free PDF 
parameters), represented by the solid dots in the illustration. These eigenvectors are obtained by 
an iterative procedure to diagonalize the Hessian matrix, adjusting the step sizes of the numerical 
calculation to match the natural physical scales. This procedure efficiently overcomes a number of 
long-standing obstacles^ encountered when applying standard tools to perform error propagation 
in the global minimization approach. Details are given in |1C, 11|. 

The uncertainty analysis for our new generation of PDF's makes full use of this method. The 
result is 2Np + 1 PDF sets, consisting of the best fit 5o and eigenvector basis sets in the plus and 

^For our analysis, there are ~ 1800 data points from ~ 15 different sets of measurements with very different 
systematics and a wide range of precision. 

^The obstacles are due to difficulties in calculating physically meaningful error matrices by finite differences, in the 
face of (i) vastly different scales of eigenvalues (~ 10^) in different, a priori unknown, directions in the high-dimension 
parameter space, and (ii) numerical fluctuations due to (multi-dimensional) integration errors in the theoretical 
(PQCD) calculation and round-off errors. 
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minus directions along each eigenvector. From these PDF sets we can calculate the best estimate, 
and the range of uncertainty, for the PDF's themselves and for any physical quantity that depends 
on them. The uncertainty can be computed from the simple master formula 

AX = - l^[X(5+)-X(Sr)]M , (3) 

where X is the observable, and X{Sf) are the predictions for X based on the PDF sets Sf from 
the eigenvector basis. 



2.4 Perturbative QCD Parameters and Input 

The fundamental parameters of perturbative QCD are the coupling ag and the quark masses. In 
principle, these parameters can be determined in the global fit, along with the nonperturbative 
parton distribution functions. In practice, they are determined more precisely and definitively in 
dedicated measurements. We therefore treat these parameters as input to our global analysis. For 



Us, we use as{Mz) = 0.118, the Particle Data Group average |^. (However, cf. the discussion 



on leaving a<j free in the fitting in Sec. 3.3.4 and Sec. ^^■) The light quarks u,d,s are treated as 
massless, as usual. In the AIS scheme, the evolution kernels of the PDF's are mass-independent: 
charm and bottom masses enter only through the scales at which the heavy quark flavors are turned 
on in the evolution. We take nic = 1.3 GeV and m;, = 4.5 GeV. 

The hard matrix elements that enter into the PQCD calculations are all taken to be at NLO 
(except for the LO fit, of course). Although it is now possible to incorporate charm and bottom 
mass effects in the NLO hard cross sections, in the so-called variable flavor number scheme (VFNS) 



1 29 -32 1, we have decided to use the conventional (zero-mass parton) hard cross sections in our 
standard analysis for three practical reasons. First, hard cross-section calculations in the VFNS 
(with non-zero parton masses) are not yet available for processes other than DIS. Second, even in 
DIS, there are different ways to implement the VFNS. This introduces yet another type of "scheme 
dependence", and hence a source of confusion for users of PDF's. (A recently proposed "natural" 



implementation of the VFNS [33|, if widely accepted, could alleviate this source of confusion.) 



Finally, since most users of PDF's only have at their disposal the standard hard cross sections for 
zero-mass partons, it is preferable to have PDF's determined by a global analysis calculated in the 
same way as they are used. In a subsequent paper, we will study the heavy quark mass effects in 
global analysis in detail, and publish PDF's determined in the VFNS with a full treatment of quark 



mass effects, using the natural implementation of [33|. 



2.5 Parametrization of Nonperturbative Input PDF's 

The nonperturbative input to the global analysis are PDF's specified in a parametrized form at 
a fixed low-energy scale Qq = 1.3 GeV .|^ The particular functional forms, and the value of Qo, 

^In CTEQ5 and other recent global fits, the slightly lower value Qo = 1 GeV was used. Because QCD evolution 
is quite rapid in the Q ~ 1—2 GeV range, one gets enhanced sensitivity to the parameters when starting from a 
relatively low scale. However, that rapid variation, which occurs below the region of applicability of PQCD, can 
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are not crucial, as long as the parametrization is flexible enough to describe the range of behavior 
permitted by the available data at the level of accuracy of the data. Intuitive notions of smoothness 
may be built into the input functions. The PDF's at all higher Q are determined from the input 
functions by the NLO perturbative QCD evolution equations. 
The functional form that we use is 

X fix, Qo) = Ao (1 - x)^'' e^3^ (1 + e^^ x)^^ (4) 

with independent parameters for parton flavor combinations = u — u, = d — d, g, and u + d . 
We assume s = s = 0.2 (n + d) at Qq. The form is "derived" by including a 1:1 Pade expansion 
in the quantity d [log(x/)] /dx. This logarithmic derivative has an especially simple form for the 
time-honored canonical parametrization x f{x) = Aqx^^ (1 — x)^^. For our parametrization there 
are poles at x = and x = 1 to represent the singularities associated with Regge behavior at small 
X and quark counting rules at large x, along with a ratio of (linear) polynomials to describe the 
intermediate region in a smooth way. 

Equation (Q) provides somewhat more versatility than previous CTEQ parametrizations. For 
some flavors, it has more freedom than needed, so that not all of the parameters can be determined 
by the present data. In those cases, some parameters are held fixed, guided by the Hessian method 



in the final fitting process as discussed in Sec. 3.2 and in Ref. [11|. In the future, more data may 



find even this parametrization too restrictive — first for the very well constrained n(x) — and it will 
be natural to proceed to a 2:2 Pade form. To distinguish the d and u distributions, we parametrize 
the ratio d/u, as a sum of two terms: 

dix, Qo)/uix, Qo) = Ao x^' (1 - x)^2 + (1 + x) (1 - x)^^ . (5) 

Altogether we use 20 free shape parameters to model the PDF's at Qo- 

An extensive study of the sensitivity of the global fits to the choice of parametrization has been 
carried out in the process of this analysis, by trying different parametrizations, and by using different 
values of Qo (which is equivalent to changing the functional forms). In the main body of this paper, 
we concentrate on results obtained with the standard choices described above. Comments on the 
effects of parametrization on the physics results will be made in the text as appropriate. Some 
studies of results obtained with alternative parametrizations are described in Appendix 

3 Results on New Parton Distributions 

With the theoretical and experimental input, methods, and parametrizations described above, we 
arrive at a standard set of parton distributions (the nominal "best fit") together with a complete 
set of eigenvector parton distribution sets that characterizes the neighborhood of acceptable global 
fits in the parton parameter space. The study is carried out mainly in the MS scheme.0 We now 
discuss the salient features of the results and the related physics issues. 

produce unusual behavior of the functions at small Q, which may have no physical significance. Cf. the discussions 
in Sec. and Appendix ^ 

*For the convenience of certain applications, we also present one standard set each of parton distributions in the 



DIS scheme and at leading order. Cf. Sec. 3.1.3 
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3.1 The New Standard PDF Sets 



The standard set of parton distributions in the MS scheme, referred to as CTEQ6M, provides an 



excellent global fit to the data sets listed in Sec. 2.1. An overall view of these PDF's is shown in 
Fig. ^ at two scales Q = 2 and 100 GeV. The overah for the CTEQ6M fit is 1954 for 1811 
data points. The parameters for this fit and the individual values for the data sets are given in 
Appendix]^. In the next two subsections, we discuss the comparison of this fit to the data sets, and 
then describe the new features of the parton distributions themselves. Quantitative comparison of 
data and fit is studied in more depth in Appendix ^ 

Fig. 1 : Overview of the CTEQ6M parton distribution functions at Q = 2 and 100 GeV. 





3.1.1 Comparison with Data 

The fact that correlated systematic errors are now fully included in the fitting procedure allows a 
more detailed study of the quality of fits than was possible in the past. We can take the correlated 
systematic errors into account explicitly when comparing data and theory, by using the procedure 



discussed in Sec. B.2 of Appendix M. In particular, based on the formula for the extended x 
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function expressed in the simple form Eq. (11), we obtain a precise graphical representation of the 
quality of the fit by superimposing the theory curves on the shifted data points {Di} containing 
the fitted systematic errors. The remaining errors are purely uncorrelated, hence are properly 
represented by error bars. We use this method to present the results of our fits whenever possible. 
Figure ^ shows the comparison of the CTEQ6M fit to the latest data of the HI experiment 



|14|. The extensive data set is divided into two plots: (a) for x < 0.01, and (b) for x > 0.01. In 
order to keep the various x bins separated, the values of F2 on the plot have been offset vertically 
for the kth bin according to the formula: ordinate = F2{x,Q'^) + 0.15 k. The excellent fit seen 
in the figure is supported by a x? value of 228 for 230 data points. Similarly, Fig. H shows the 



comparison to the latest data from ZEUS |15|. One again sees very good overall agreement. 
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Fig. 2 : Comparison of the CTEQ6M fit to the HI data [14| in separate x bins. The data points 
include the estimated corrections for systematic errors. The error bars contain statistical only. 
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Fig. 3 : Comparison of the CTEQ6M fit to the ZEUS data @ . Same format as Fig. 
2.5 
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The value is 263 for 229 data points. This is 2a (cr = \/2iV = 21) away from the ideal value of 
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N = 229. A closer inspection of Fig. |^ does not suggest any systematic disagreement. To assess 
the significance of this 2a effect, we examine in detail the systematic shifts obtained in the fit in 



Appendix B.3. We find that they are all quite reasonable, thus giving us confidence that the fit is 
indeed of good quality. 

The new PDF's also fit the older fixed-target DIS experiments well — similar to previous 
global analyses. Figure ^ shows the comparison to the fixed-target neutral current experiments 
BCDMS and NMC. Because we are incorporating the fully correlated systematic errors, the data 
sets used for these experiments are those obtained at each measured incoming energy, rather than 
the "combined" data sets that are usually shown. This more detailed and quantitative comparison 
is important when we try to evaluate the statistical significance of the fits in our uncertainty analysis 
(cf . Appendix ^ . 

Fig. 4 : Comparison of the CTEQ6M fit with the BCDMS [|l9| and NMC ^ data on fip DIS. 
Same format as Fig. |2|. (The offset for the kth Q value in (b) is 0.2k.) 
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The per data point for these data sets are 1.11 (378/339) for BCDMS and 1.52 (305/201) for 
NMC. The fit to the BCDMS data is clearly excellent, both by inspection of Fig. ^a and by the 
normal test. For the NMC data. Fig. ^b shows rather good overall agreement, but with some 
notable large fluctuations away from the smooth theory curves. The most noticeable fluctuations — 
points with almost the same {x, Q) values — are from data sets taken at different incoming energies.^ 
This is reflected in the x"^ value which is quite a bit larger than expected for a normal probability 
distribution. This raises two issues: (i) Is the fit acceptable or unacceptable? (ii) Can the fit be 

^These fluctuations are smoothed out by re-binning and other measures in the combined data set which is 
not used here. 
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substantially improved by a different theoretical model, e.g., the inclusion of higher- twist terms? 
The first question is addressed in Sec. B.3, where the excess is shown to be attributable to 



larger-than-normal fluctuations of the data points (which is unrelated to the viability of the theory 
model), and that the shifts due to systematic errors (which is related to the goodness-of-fit) are all 
within the range of normal statistics. The second question is addressed in Sec. |3.3.2 and Appendix 
0, where we show that the inclusion of higher-twist contributions does not result in a substantial 
improvement of the fit. 

The charged-current DIS data from the CCFR experiment are also used in our global anal- 
ysis. The agreement between data and theory for the CCFR measurements of F2 and F3 is good, 
comparable to previous analyses. Because of the important role of the charm quark in the quan- 
titative treatment of this process, and the lack of information on correlations between the current 
F2 and F3 data,0 we defer a detailed study of the charged-current DIS fit to the forthcoming global 
analysis including heavy quark mass effects. 

Of the new experimental input, perhaps the most interesting and significant in its impact 



on the parton distributions is the one-jet inclusive cross section from D0 |16]. These new data 
represent a considerable expansion in kinematic range over the previous jet measurements, by 
providing measurements in 5 separate rapidity bins, with r] up to 3.0. Our fits agree extremely well 
with these data, in both shape and magnitude over the full kinematic range, as shown in Fig. ^a. 
Comparison to the corresponding CDF measurement in the central rapidity region is also shown in 
the parallel plot, Fig. |5|b. 

Fig. 5 : Comparison of the CTEQ6M fit to the inclusive jet data: (a) D0 (the boundary values 
of the 5 rapidity bins are: 0, 0.5, 1.0, 1.5, 2.0 and 3.0 @; (b) CDF (central rapidity, rj < 0.5) |2|. 
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The agreement between theory and experiment is made even clearer in Fig. |^ where the 

difference between data and theory is plotted. The value for the fit to the jet data, including 

all systematics, is 113/123, combining the two experiments. In Figs. ^ and ^ the error bars are 

®The CCFR F2 measurements have been re-analyzed in a model-independent way . The new data are presented 
in a such a way that correlations with the old F3 measurements fedj are no longer available. 
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combined statistical and diagonal systematic errors .[] The ratio plot comparing to the CDF data is 
not shown; it is similar to that of previous global fits, such as CTEQ5HJ. A detailed study of the 
impact of these jet data on the determination of gluon distributions and the potential for observing 
signals for new physics at the Tevatron and the LHC will be carried out in a separate study. 

Fig. 6 : Closer comparison between CTEQ6M and the D0 jet data as fractional differences. 
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3.1.2 The New Parton Distributions in the MS scheme 

Figure ^ shows an overview of the comparison between the new PDF's and the previous generation 
of CTEQ PDF's, the CTEQ5M1 set, at Q = 2 GeV. 

Fig. 7 : Comparison of CTEQ6M (dashed) to CTEQ5M1 (dot-dashed) PDF's at Q = 2 GeV. 

(The unlabeled curves are u and s = s.) 
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^The D0 jet data involve partially correlated systematic errors which can only be characterized by a full N x N 
covariance matrix V. Our method of correcting for systematic errors on the data points, which depends on writing 
V in a separable form in terms of the K x N coefficients {f3ji}, Eqs. (fi]) and (S), cannot be apply in this case. 
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In order to exhibit the behavior of the PDF's clearly for both large and small x in one single plot, 
we choose the abscissa to be scaled according to x^^^. Correspondingly, we multiply the ordinate 
by the factor x^^^, so that the area under each curve is proportional to the momentum fraction 
carried by that flavor in the relevant x range. We see that the most noticeable change occurs in 
the gluon distribution. 



The gluon distribution Figure § gives a more detailed picture of the changes in the gluon 
distribution at Q = 2 and 100 GeV. For low and moderate values of x, say 10"^ < x < 0.1, the 
most important constraint is due to the rate of Q^-evolution of the DIS structure functions. The 
HERA data in this region are ever improving in accuracy, but the new data has not made a sizable 
change in the gluon distribution, as seen in Figs. and |a. (Below Q = 2 GeV, one may find larger 
deviations between the new and old distributions, but extrapolation of PDF's into that low-Q 
region is well known to be unstable. We will return to this point in Sec. ^I^ .) 

Fig. 8 : Comparison of CTEQ6M (dashed) to CTEQ5M1 (dot-dashed) gluon distributions at 
Q = 2 and 100 GeV. (a) the small-x region; (b) the large-x region. 
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In the moderate to high x range, x > 0.01, the inclusive jet data are now playing a very 
important role. The combined effects of the precision DIS and jet data have made a significant 
shift in the shape of the gluon distribution, as seen in Figs. 0and|b. The new gluon distribution 
is significantly harder than for CTEQ5M1 and ah MRST PDF sets (cf. Sec. U) at all Q scales. 
This behavior is mainly dictated by the inclusive jet data, which lie in the range 50 < Q < 500 GeV 
and 0.01 < x < 0.5. The D0 data in the higher rj bins now allow a higher x reach than the central 
jet data from previous measurements. The hard gluon distribution becomes amplified at lower Q 
scales, due to the nature of QCD evolution and the fact that there is no direct experimental handle 
on the gluon at large x and low Q. The enhanced gluon at large x is similar to the CTEQ4HJ and 
CTEQ5IIJ distributions. However, there is an important difference in the significance of the current 
result: whereas the "HJ" PDF sets were obtained specifically for fitting the high pT jet data, by 
artificially inflating the weights of those points in the global fit, the CTEQ6M gluon distribution 
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results naturally from the new global fit without any such special emphasis. The visually good fits 
seen in Figs. |^ and ^ are quantitatively substantiated by the small 'X^ /N value of 113/123 for the 
jet data sets. Since CTEQ6M represents the "best fit" in the global analysis, this gluon behavior 
is also fully consistent with all DIS and Drell-Yan data sets used in the fit, as discussed previously. 

3.1.3 DIS and LO Parton Distributions 

For most applications, the MS parton distributions are the most appropriate. But for certain 
applications, PDF's in the NLO-DIS scheme are preferred. For these purposes we have obtained 
CTEQ6D by performing independent global fits in the NLO-DIS scheme. Although, in principle, 
one could obtain NLO-DIS parton distributions by a simple transformation from a NLO-MS' set, 
the reliability of such a procedure is uncertain in x regions where the numerical values of the PDF's 
for different flavors (which transform into each other) are orders of magnitude apart It is thus 
preferable to perform an independent fit. The quality of fit for the NLO-DIS fit is comparable to 
that of the MS case. 

Leading-order parton distributions are useful for Monte Carlo event generator applications. 
For this purpose, we produce a set of LO PDF's, CTEQ6L. These are obtained in a global fit with 
leading order hard cross sections. In this fit, the NLO ag (with as{mz) = 0.118) is used for the 
QCD coupling, rather than the LO as- This combination of choices of LO hard cross sections and 
NLO as conforms with the prevailing practice in MC event generator applications. 



3.2 Eigenvector PDF Sets For Uncertainty Analyses 



As mentioned in the Introduction and in Sec. |2.3| , an important goal of this work is to advance 
global QCD analysis to include systematic and quantitative estimates of uncertainties — on both 
PDF's and their physical predictions. For this purpose, we characterize the behavior of the global 
function in the neighborhood of the minimum by a set of eigenvector PDF sets, according to 
the method of |jll| (cf. the illustration in Sec. |2.3| ). 

The eigenvector sets are obtained in two steps. First, the full set of parameters described 



in Sec. ^ is probed with the iterative procedure of 1 10, 1^, in order to identify those parameters 
that are actually sensitive to the input data set.^ With current data, and our functional form, 20 
such parameters are identified. We then generate the eigenvector PDF sets in the 20 dimensional 
parameter space as described in p^,[TT|, with the remaining parameters held fixed. This results in 
40 PDF sets, a -|- (up) and a — (down) set for each eigenvector direction in the parameter space, 
in addition to the central CTEQ6M set. Ideally, in the quadratic approximation of the Hessian 
approach, the curves would be symmetric around the minimum, so that only one displacement 



A recently discovered |34[| error in the QCD evolution of DIS scheme PDF's in previous CTEQ analyses has been 
corrected in this work. This error has negligible practical consequences in physical applications in the current energy 
range, because the small deviation in evolution was naturally compensated by the fitted PDF's. As long as the same 
DIS hard cross sections are used in the fitting and in the applications, as is indeed the case, the same physical cross 
sections will be reproduced. We thank G. Salam for useful communications on this issue. 

®The sensitive parameters are those that are not close to "flat" directions in the overall parameter space, as 
mapped out by the iterative diagonalization procedure of px| ]. 
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would be needed for each eigenvector. However, we observe some asymmetry in certain directions 
in practice, so we have decided to generate both up and down sets in each eigenvector direction; 
this provides more information on the behavior of constant-x^ surfaces in the neighborhood of the 
minimum.|3 The up/down sets, called , , . . . , 5^, 5*20, correspond to a tolerance of T = 10 (cf. 
Ref. |jll|); I.e., their value is greater than the minimum by = 100. The range of uncertainty 
corresponding to the choice T = 10 represents our estimate of the range of PDF behavior that 
is consistent with the current data. Details of the error analysis that leads to this estimate are 



described in Appendix B.4. 



3.2.1 Uncertainties of PDF's 

We use the eigenvector PDF sets {S^} to estimate the uncertainty range of physical quantities (and 
of the PDF's themselves) according to the master equation (|^ which is derived in |11|. In Figs. ^ 
and |l^ we show fractional uncertainty bands for the u-quark, d-quark, and gluon distributions, 
respectively, at = 10 GeV^. In these plots, the shaded region is the envelope of allowed variation 
of the parton distribution, independently for each value of x, with tolerance T = 10. The ordinate 
is the ratio of the extreme value (up or down) to the standard CTEQ6M value. For comparison, 
the curves are CTEQ5M1 (dashed) and MRST2001 (dotted), plotted in ratio to CTEQ6M. 



Fig. 9 : Uncertainty bands for the u- and d-quark distribution functions at 
The solid line is CTEQ5M1 and the dotted hue is MRST2001. 
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The u distribution is the most accurately known of the parton distributions, since deep- 
inelastic scattering by photon exchange, being proportional to the square of the quark charge in 
leading order, is most sensitive to the u quark. The d-quark distribution is very much affected by 
the various data sets that are sensitive to u-d differences: the NC and CC DIS measurements with 
proton and deuteron targets, the DY p/d asymmetry and the W-lepton asymmetry experiments. 
The d-quark uncertainty band is seen to be noticeably wider than that of the u quark, particularly 



This asymmetry has been encountered in some applications, e.g. Ref. [ 
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at large x where there are few constraints on the ratio between the two flavors. This result provides 
quantitative confirmation of a previous study ||3^, addressing the issue of the behavior of the 
d-quark distribution at large x raised in []37| . 

The gluon distribution is the most uncertain of the PDF's — notwithstanding the increased 
constraints from recent precision DIS and inclusive jet data discussed earlier — as shown in Fig. |l^. 
The uncertainty is of order ±15% for x values up to ~ 0.3, and then it increases rapidly for large 
X. This uncertainty is of much interest for the physics programs of the Tevatron and the LHC, and 
it has been the subject of several previous studies |^^. With the new quantitative tools at hand, 

Fig. 10 : Uncertainty band for the gluon distribution function at = lOGeV^. The curves 
corresponds to CTEQ5M1 (solid), CTEQ5HJ (dashed), and MRST2001 (dotted). 
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we have found that it is important to use a sufficiently flexible parametrization of the initial distri- 
bution, in order to obtain a reliable estimate of the range of uncertainty of the gluon distribution. 
This point will be relevant when we compare the above result with other recent studies of the gluon 
distribution (cf. Sec. 5^). Some details are given in Appendix^ 

The uncertainties shown in Figs. ^ and ^ are significantly smaller than those obtained in 



our previous study |11| based on CTEQ5 inputs. The reduced uncertainty, which can be seen by 



comparing Figs. ^ and |l^ with their counterparts (Figs. 3 and 4) of Ref. [11|, is a consequence of 
the new DIS and jet data.[^ 

An important point in the interpretation of Figs. |9| and 10 is that the uncertainty ranges 
correspond to the envelopes of possible parton distributions that are consistent with the data. A 
distribution function that produces the extreme at any particular value of x is generally not extreme 
at other values of x. Thus, a PDF that follows the upper or lower boundary of the uncertainty 
band at all x would definitely not be consistent with the data. 

The CTEQ5M1 and MRST2001 distributions, plotted as ratios to CTEQ6M, are also shown 

on Figs. ^ and for comparison. Both these PDF sets are generally within the current uncertainty 

^^One must be aware that the uncertainties shown in jl^ are for a higher value of Q = 10 GeV. The uncertainties 
decrease when evolved to higher Q. 
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bounds. The hard gluon distribution in the CTEQ6M fit, discussed earher, manifests itself in the 
fact that both CTEQ5M1 and MRST2001 curves are close to the lower limit of the uncertainty 
band at large x. For comparison, we have also included the gluon distribution of CTEQ5HJ in 
Fig. |l^. It is even harder than that of CTEQ6M at x > 0.3, but still within the uncertainty band. 
Comparisons of CTEQ6M and CTEQ5M1 fits are discussed in the next subsection ( 3.3.1| ). The 
MRST2001 distributions are rather similar to CTEQ5, except at very small x where the gluon is 
smaller and the quarks are larger, both slightly outside the uncertainty ranges. Differences between 
CTEQ6 and MRST2001 fits are discussed in some detail in Sec. p.2\ 



3.3 Issues And Comments 

3.3.1 How much progress has been made? 

The fact that the new PDF's, especially the quark distributions, do not appear to differ much from 
the CTEQ5 functions testifies to the steady convergence of QCD analysis of PDF's. Progress in 
this undertaking must be measured in more precise terms than previously, because of the increasing 
demands for quantitative applications in precision SM studies and new physics searches. To illus- 
trate this point, we first compare the predictions of the preceding generation of PDF's, CTEQ5M1, 
with two key new experiments, HI and D0, in Figs. |ll| and Plots like these are often used in 
the literature as evidence of good or acceptable fits. Careful comparison with the corresponding 
results for CTEQ6M (Fig. ^ and Fig. |6|) reveals that the new fit is discernably better. 

Fig. 11 : Comparison of the CTEQ5M1 predictions to the recent HI data. Cf. format of Fig. 0. 
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The improvement in the quality of the fits can be quantified by examining the ^ values. With 
the inclusion of correlated systematic errors these numbers now carry more statistical significance 
than before. The accompanying table lists the CTEQ6M and CTEQ5M1 x^'s for the full data 
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sets used in the current global analysis, and for the two representative experiments, HI and D0, 
shown in Figs. 11 and The overall is greater by 268 (for 1811 data points) for the previous 
generation of PDF's than the current one. The difference is not very large but it is outside our 
estimated tolerance of T = 10 (Ax^ = 100). 



Fig. 12 



Comparison of the CTEQ5M1 predictions to the recent D0 data as in Fig. ||. 
(data-CTEQ5Ml)/CTEQ5Ml versus px [GeV] 
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The increases in the individual x^'s for the two precision experiments, although not very large, 
do lie outside the 90% probability range according to the uniform way we adopt for evaluating the 



uncertainty ranges, discussed in Appendix B.4. These numbers provide a quantitative measure of 
the improvement achieved. The precise interpretation of these numbers is still open to discussion, 
however, and further progress is still needed (cf. Appendix | 





Total 


HI 


D0 


data pts.) 


(1811) 


(230) 


(90) 


CTEQ6M 


1954 


228 


65 


CTEQ5M1 


2222 


285 


117 



3.3.2 Higher twist terms? 



4C]. If power- 



There have been several studies of higher-twist (HT) effects in DIS experiments ||5|, 
law corrections to leading-twist PQCD are needed, they will introduce additional nonperturbative 
degrees of freedom in the global analysis. This would complicate PDF analysis considerably, because 
the extracted PDF's would then depend on the HT model. Because there is no accepted theory of 
HT, and HT terms are probably non-factorizable and process-dependent, PDF's obtained with the 
inclusion of HT terms would no longer be universal. 

In the absence of firm theoretical guidance, we first limit the possible size of higher-twist 
terms by placing reasonable cuts on the kinematic range of data utilized in the fitting program. 



^^In evaluating these numbers for the two fits, the overall normalizations of all the data sets are allowed to 
re-adjust, within errors, between the two calculations. Otherwise, the difference in would be larger. 
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We then study phenomenologically the need for HT corrections by comparing the quahty of fits 
with and without HT parameters. Using the same kinematic cuts as in previous CTEQ analyses 
(in particular, Q > 2 GeV and W > 3.5 GeV for DIS data) we find that the inclusion of simple 
phenomenological HT factors, of the type used in the previous literature ||5, 3^, 4C], does not produce 



discernable improvement in the quality of the fit. We conclude that HT corrections are not needed, 
and therefore stay within the twist-2 PQCD formalism for our analysis. Our study of this issue 
is summarized in Appendix |C|. We believe, for the reasons mentioned in the previous paragraph, 
that should there be evidence for HT effects, it would be more desirable to raise the kinematic cuts 
and preserve the universal PDF's, rather than to introduce ad hoc HT terms that would reduce the 
usefulness of the PDF's. 

3.3.3 Signs of anomalies at large x or small x? 

One of the important goals of quantitative global analysis of PDF's is to provide stringent tests of 
the validity, and the efficacy, of PQCD. Anomalies observed in one or more of the hard processes 
in the global analysis could indicate signs of new physics. 

Among the hard processes in current global analyses, the high-i?T jet cross section measured 
by CDF and D0 probes the smallest distance scales, and hence provides the best window to 
discover new physics at large energy scales. There was some excitement when CDF first measured 
a possible excess in the high-i?T jet cross section, compared to conventional PDF predictions of 



that time [41|. This excitement has abated with the advent of the "PDF explanation" (the 
CTEQ4HJ and CTEQ5HJ type of gluon distributions Q) and the subsequent D0 measurements 
P3| (which do not show as pronounced an effect). However, the issue remains an interesting one 
since the CTEQ4HJ/5HJ gluon distributions have not been universally accepted as the PDF of 
choice. Where does the issue stand in view of the recent data and global analyses? 



As already mentioned in Sec. p.l.2| above, the CTEQ6M central fit arrives at a gluon distri- 
bution that is considerably harder at high x than the conventional ones (such as CTEQ5M and 
MRST2001). It is more like CTEQ4HJ and CTEQ5HJ, although obtained without giving any extra 
weight to the jet data. This change comes about from three developments: (i) the statistical power 
and the expanded range of the new D0 jet data set in the five separate rapidity intervals |16|; 



(ii) the increased flexibility of the nonperturbative gluon distribution in the new parametrization 
adopted for this global analysis (cf. Sees. |2.5| , 5.2 and Appendix 0); (iii) the fact that the harder 



gluon is needed to provide a better fit to the D0 data. On one hand, this result provides further 
support for the PDF explanation of the high-E'r jet cross section. On the other hand, by the more 



quantitative uncertainty analysis shown in Fig. 10, there is still a large range of possible behavior 



for the high-x gluon, allowing both the conventional and the "HJ-like" shapes. Further work will 
be required to draw a firm conclusion on this important issue. Continued search for signs of new 
physics must be pursued with vigor in all channels. 

Signals for a departure from conventional PQCD could also appear in the very small x region. 



^■'This conclusion may appear to contradict some common lore about HT. We note that most of the previous 
studies either were not done with as wide a range of data sets, or did not incorporate the full correlated systematic 
errors. Also, we use a relatively high cut of 4GeV'^, precisely to reduce the HT effects. 
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Our analysis is based on the standard NLO DGLAP formalism. The impressive agreement between 
the fits and all the data sets shows no indication of a breakdown of conventional PQCD; nor do we 
detect any necessity for including NNLO corrections |^4|. This does not mean that effects beyond 
the simple theoretical model are not present. It does mean that the search for these effects must 
depend on even more precise, and extensive, experimental and theoretical input. 



3.3.4 Determination of a,? 



As mentioned in Sec. 2^, for our standard analysis we input the strong-coupling strength as{mz) = 
0.118, based on dedicated measurements from QCD studies at e~^e~ colliders and sum rules in 
lepton-hadron processes. It is desirable to check that this value is consistent with our global 
analysis, and, beyond that, to see whether the global analysis can provide a useful independent 
measurement of Og. 

For this purpose, we have repeated the fitting with different choices of as{mz)- The resulting 
variation of the global is shown in Fig. 13. The minimum in occurs at a value of as{mz) — 
0.1165, somewhat lower than the world average from precision measurements. For our choice 
of as{mz) = 0.118, is greater than the minimum by about 5. This difference is completely 
insignificant with respect to our estimated tolerance of Ax^ < 100. Thus, the value of as{mz) 

favored by global PDF analysis is in excellent agreement with the world average. Furthermore, 



according to Fig. 13 and our adopted tolerance, the allowed range of asimz), as measured in global 
PDF analysis, is about 0.110 to 0.123 (i.e. as{mz) = 0.1165 ±0.0065). This is not competitive with 
other dedicated measurements. The basic reason is well known: the uncertainty of Og determination 
in QCD analysis of parton distributions is strongly tied to the uncertainty of the gluon distribution, 
which still has substantial uncertainties. 



Fig. 13 : x^ vs. as{mz) curve, with two alternative definitions of the NLO 
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We also note that the definition of NLO as(/u) in perturbative QCD is subject to an ambiguity 
relating to the solution of the renormalization group equation (RGE) . We have used the conventional 
MS formula for asifJ-), obtained by solving the NLO RGE as an expansion in inverse powers of In /j, 
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. An alternative definition, favored by some, and more convenient for higher-order extensions, is 



to numericahy solve the truncated RGE equation for ^da/d^ at NLO. The dotted curve in Fig. 13 
shows versus as{mz) using this alternative definition of The difference between the two 

treatments is seen to be within the range of uncertainties. 

3.4 User Interface 

All PDF sets described above will be available (at http://cteq.org) in the usual CTEQ format, 
using external data tables. These can be used in the same way as previous CTEQ PDF's. 

In addition, to facilitate progress toward a universal user interface, we are preparing a new 
format, which uses an evolution program and small external files containing only the coefficients 
for the initial parton distributions, following the "Les Houches Accord on PDF's" [ }45[ . 

4 Physical Predictions and their Uncertainties 

The main utility of the new parton distribution functions is to make predictions on physical cross 
sections and their uncertainty ranges — both for precision SM studies and for new physics searches. 
Detailed applications go beyond the scope of this paper. We will, however, present some general 
results on estimated uncertainties of various parton luminosity functions for the Tevatron (Runll) 
and the LHC. From these results, one can easily estimate the expected uncertainties of a variety of 
physical processes of interest for these hadron collider programs: W/Z cross sections, Higgs, and 
top (both single and pair production) rates due to various production mechanisms, etc. 



Figure 14 a shows the fractional uncertainties of the gluon-gluon and quark-antiquark lumi- 



nosity functions, to final states with the quantum numbers of W^, W , 7* and Z^, at the Tevatron. 




Fig. 14 : Uncertainties of the luminosity functions at the Tevatron. 
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Figure similarly displays the quark-gluon luminosity functions for , W" , Z final-state 
quantum numbers. As expected, the gluon-gluon luminosity has the greatest uncertainty, especially 
at large reflecting the uncertainty of the gluon distribution at large x. The qq — > W~^, W~ Z 
luminosity uncertainties are almost constant throughout the range of \/I plotted, at the ±4% level. 
The qG W~^, W~ Z luminosity uncertainties resemble the geometric mean between those for 
GG and qG. The four qG uncertainty bands are very similar to one another, but close examination 
shows that they are not identical, because of the differences in quark flavor mix. 



Similarly, Fig. 15 shows the fractional uncertainties of the corresponding luminosity functions 
for the LHC. The qq luminosity uncertainties are also fairly constant, being ±5% for 

qq W+ and ±4% for qq W~ for WO < V§ < 200 GeV. 

Fig. 15 : Uncertainties of the luminosity functions at the LHC. 
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The gluon-gluon luminosity uncertainty is ±3% at the narrowest point (vs ~ 250 GeV), increasing 
to ±10 % at both ends of the \/I range shown. From this, one can estimate that the uncertainty 
on the production cross section of a light mass Higgs particle (say, 100—200 GeV) at the LHC due 
to PDF's is on the order of ±5%. The uncertainty of the gluon-gluon luminosity function at the 
LHC is seen to be generally smaller than for the Tevatron case. The difference results from the 
combination of two effects: the uncertainty decreases with evolution, and the contributing x ranges 
for the two cases are somewhat different. The luminosity uncertainties for the Compton processes 
at the LHC, Fig. p^b, are again rather similar to one another. The uncertainties range from around 
±6% at low energies to ±2 - 3% at \/I ~ 500 GeV. 

The W cross section can be measured with great precision at hadron-hadron colliders. The 
cross section is large and the backgrounds are relatively small. In addition, the theoretical uncer- 
tainties are small. This makes the W cross section an ideal benchmark with which to normalize 
other cross sections, especially as there remains a significant uncertainty as to the value of the total 
inelastic cross section at the Tevatron. (The inelastic cross section is used by the experimental col- 
laborations to normalize all luminosity calculations. CDF and D0 currently assume values for this 
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cross section that differ by 6%, producing cross-section differences between the two experiments of 
the same amount.) The simple estimates of the W cross section uncertainty the PDF uncertainties 
at the Tevatron and the LHC, given above, can be improved by including NLO terms and by using 



the more precise Lagrange method [12|. Using the 40 Hessian eigenvector sets, the uncertainty for 
any cross section can be calculated. This procedure will be made more straightforward with the 
adoption of the Les Houches accord on PDF's. 

5 Comparison with Other Parton Distribution Analyses 

5.1 Previous Uncertainty Studies 

The quantitative analysis of uncertainties of PDF's, taking into account the experimental systematic 
errors, was pioneered by in the context of detailed study of the precision DIS experiments. 

These studies were based on minimization techniques. The methods that we developed in [|lO[ - 
|l2|, and used in this paper, have extended this approach to a global analysis (cf. the recent review 
|p6|| .) Because the scope of the experimental data used in the earlier studies is different from ours, 
direct comparison with those results is not possible. 

Giele et al. (GKK) advocate a more general approach, using a pure probabilistic analysis with 
random sampling of the parton parameter space 0]. While theoretically appealing, the proposed 
method encounters difficulties when confronted with real experimental data sets, which either do not 
meet the strict statistical criteria or are mutually incompatible according to a rigorous probabilistic 
interpretation of the errors. As a result, although the methodology is clearly established, it has not 
yet produced practical results that can be generally used. 

In an ideal world, where all experimental data sets are self-consistent and mutually com- 
patible, the GKK method is equivalent to the one we use, provided, in addition, the Gaussian 
approximation underlying the approach is valid. If, however, is not approximately quadratic 
in the neighborhood of the minimum, and if the precise response functions for all experiments in 
the global analysis are known, then the GKK method would be superior. Minimization of could 
still be useful in that case, but the accuracy is only as good as the Gaussian approximation. In 
reality, of course, the detailed experimental response functions are rarely, if ever, known. So re- 
alistically the study of PDF uncertainty will necessarily consist of quantitative statistical analysis 
supplemented by pragmatic choices based on physics considerations and experience. 

5.2 Comparison to MRST2001 

Our results can be compared more directly to those of the recent MRST2001 analysis An 
overview of the comparison between the CTEQ6M and MRST2001 parton distributions is shown 
in Figure Significant differences appear in the gluon distribution, similar to the differences 
between CTEQ6 and CTEQ5 described in Sec. |3.1.2| . Unlike CTEQ5, the MRST analysis uses 



essentially the same recent experimental input as CTEQ6. Thus it is natural to seek the sources 
of the observed differences. 

Although most of the experiments used in the CTEQ6 and MRST2001 analyses are the same, 
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there are some differences both in data sets and in the way that experimental error information 
is utihzed. The CTEQ6 analysis uses the separate-energy data sets of the BCDMS and NMC 
experiments (for which information on correlated systematic errors is available) whereas MRST 
uses the combined-energy sets for these experiments. The CTEQ6 analysis does not use SLAC 
experiments, because that data lies in the region of very low Q^. Also, CTEQ6 does not use the 
HERA charm-production data, because the errors are still large, and because a proper theoretical 
treatment of this process requires the full inclusion of charm mass effects in the matrix elements.^ 
In addition, the kinematic cuts on data selection are somewhat different. 

Fig. 16 : CTEQ6M PDF's compared to MRST2001. (The unlabeled curves are n and s = s.) 




X 



There are several differences in methodology. In the analysis of data, the published experi- 
mental correlated systematic errors are fully incorporated in the CTEQ6 global fitting procedure. 
In the theoretical calculations, we use the conventional (zero-mass parton) Wilson coefficients for 
DIS structure functions for reasons explained in Sec. whereas MRST uses the Thorne-Roberts 
version of the VFNS calculation . Finally, the two groups use different functional forms for the 
parametrization of the nonperturbative input parton distributions. 

Beyond a visual inspection of the PDF's, insight can be gained by comparing the two fits 
for data sets used in both analyses. In principle, one might just compare the values of the 
corresponding experiments for the two fits, cf. numbers given in Appendix ^ and Table 1 of 
Ref. In practice, a direct comparison between the DIS experiments is not possible because the 
two analyses have different treatments of the experimental systematic errors and different choices 
of Wilson coefficients mentioned above. Among experiments that can be directly compared, two 
differences are noticeable. The ffist experiment is E605 (Drell-Yan): /N is given as 232/136 in 
MRST2001 |13|, and 95/119 for CTEQ6M.|3 However, the seemingly large MRST number was due 
to their use of statistical errors only for this experiment. The two fits are actually comparable for 
^*Work is under way for inclusion of charm production in a separate global analysis. 

^^The differences in the number of data points are due to slightly different kinematic cuts in the two analyses. 
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this experiment. The second is the combined D0 and CDF jet cross section measurements: /N is 
170/113 for MRST2001, and 113/123 for CTEQ6M. This difference is statisticahy quite significant, 
and it is responsible for the rather distinct gluon distributions obtained by the two groups. We 
beheve that this difference is primarily due to the parametrization of the nonperturbative parton 
distributions, especially the gluon, as will be discussed below and in Appendix 0. 



We now comment more specifically on a number of issues highlighted in the MRST paper 
the perspective of the current study. 



from 



Large-x gluon behavior The inclusive jet data have a strong influence in the determination 
of the gluon distribution at medium and large x. The MRST analysis found a rather significant 
competition between the jet data and the Drell-Yan (E605) data in the context of their global fit. 
This requires either a compromise, the MSRT2001 set, or a good fit to the jet data with a much 



poorer Drell-Yan fit (and a rather unusual gluon shape), the MRST2001J set |13]. 

This dilemma is not observed in our analysis. The CTEQ6M fit to both the jet data and the 
Drell-Yan data appear nearly "ideal" (i.e., /N ~ 1), and the gluon shape is rather smooth. This 
clear difference results mainly from the parametrization of the PDF's, especially that of the gluon. 
We are led to this conclusion by a study where we adopt the MRST parametrization for the gluon 
and repeat the global fit. We then obtain a "best fit" in which the gluon has a shape similar to 
MRST2001J. The gluon distributions at Q = 2 GeV for CTEQ6M, MRST2001, MRST2001J, and 
the MRST-like fit that we obtain by adopting the MRST parametrization are compared in Figure 
17. The "wiggle" of the gluon function observed in the latter two fits (which led MRST to reject the 
MRST2001J fit [|l^]) is clearly due to the particular functional form. More details, including other 
examples and results aX Q = 1 GeV (which are more sensitive to the differences), are presented in 
Appendix ^ on the question of parametrization. 

Fig. 17 : Comparison of the gluon distributions at Q = 2 GeV, for CTEQ6M (sohd), MRST2001 
(long-dashed), MRST2001J (short-dashed) and the MRST-hke fit (dotted). 
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Gluon behavior at small x and small Q The MRST study found a strong preference for a 
negative gluon distribution at small x and small Q. It has been known for a long time that QCD 
evolution is extremely rapid at small Q. This makes extrapolation backward to Q = 1 GeV quite 
unstable |46|, even though the behavior of the PDF's at Q > 2 GeV is rather tightly constrained 
by the experimental data. It is generally agreed that there is no theoretical requirement that 
parton distributions be positive definite at any particular Q, as long as cross section predictions 
stay positive. The issue is, thus, only a phenomenological one. 

We have chosen Qq = 1.3 GeV for the CTEQ6 fit, and find no indication for G{x,Q) going 
negative in the region Q > 1.3 GeV. To consider lower values of Q, we have made alternative fits 
with Qq down to 1.0 GeV which are described in Appendix The results can be summarized 
as follows: (i) Whether or not the gluon distribution tends to go negative is strongly dependent 
on the choice of Qq — it will always go negative at some low Q, as anticipated in ||4^. (ii) With 
Qo = 1 GeV, parametrizations that allow for a negative gluon at small x can yield slightly lower 
than those with positive-definite gluons. This has no phenomenological significance, however, since 
the difference between the resulting gluon distributions is much smaller than its uncertainty in this 
region. The uncertainty assessment is closely related to the study of parametrization described in 
Appendix Note, in particular. Fig. 



Measurement of MRST performed an analysis to determine as{mz) from their global fits, 
obtaining as{mz) = 0.119 ± 0.002 (expt.) ± 0.003 (theory). As discussed in Sec. 3.3.4 , our study 
obtained a lower central value and a substantially larger error estimate. The latter (±0.002 vs. 
±0.0065) is mainly due to the different choices of allowed Axtot (20 vs. 100). In fact, in order to 
achieve Aas{mz) = ±0.002 (expt.) from Fig. |l^, we would have to use Axtot ~ 10- view of 
our study of the overall tolerance parameter discussed in Sec. B.4, we cannot justify having such 
a narrow range for as{mz), since many perfectly acceptable global fits exist within the full range 
as{mz) = 0.1165 ± 0.0065, even without including theoretical uncertainties. 



6 Concluding Remarks 

In this paper, a new generation of parton distribution functions has been presented, making full 
use of the constraints of both the old and new data. This global analysis significantly expands 
the scope of the currently used PDF analyses on two fronts: (i) a full treatment of available 
experimental correlated systematic errors, and (ii) a systematic treatment of uncertainties of the 
parton distributions and their physical predictions, using a recently developed eigenvector-basis 
approach to the Hessian method. Thus, in addition to obtaining new best-fit PDF's, the ranges of 
uncertainties of the PDF's and their physical predictions are systematically assessed. 

Among the improvements made in PDF determination, progress on the gluon distribution 
is particularly worth noting. The new D0 jet data play a significant role in obtaining a hard 
gluon distribution at large x — quite different from previous standard fits. We have shown that, at 
= 10 GeV^, its uncertainty has been narrowed by recent precision data to ±10% at small x (up 
to X ~ 0.3). The uncertainty then increases quite rapidly (in spite of the improved constraints), 
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reaching a factor of 2 by x ~ 0.5. The need for additional precise data to pin down the gluon 
distribution over a larger x range is obvious. The excellent agreement between the new D0 non- 
central jet cross sections and the precision DIS and DY data sets, within the PQCD framework 
using the new CTEQ6 PDF's, is significant. It lends credence to the PDF explanation of the 
earlier CDF high pT jet measurement. A crucial element in achieving this consistent picture is to 
allow sufficient flexibility in parametrizing the nonperturbative parton distributions, in order to 
accommodate a range of possible behaviors at both low and high x. 

The eigenvector basis PDF's obtained by this analysis allow the calculation of the uncertainty 
of any physical quantity that depends on the PDF's. We have presented results on parton-parton 
luminosity function uncertainties for both the Tevatron and the LHC. From these, uncertainties of 
a wide range of SM and new physics signals and backgrounds can be inferred. For instance, the 
uncertainties on the production cross sections of W, Z at the Tevatron and the LHC are estimated 
to be ±4% and ±5% respectively, and that of a light Higgs at the LHC to be ±5%. More detailed 
calculations on the predicted cross sections and their uncertainties for processes of interest can 
easily be done, using the standard CTEQ6M and its associated eigenvector basis PDF's, or using 
the Lagrange method described in Ref. [12|. 

This work marks only the first step in the effort to advance the global analysis of PDF's, 
incorporating the systematic treatment of uncertainties. Improvements and applications of the 
results discussed in this paper are worth pursuing. It is also important to advance the theoretical 
input to the global analysis, including resummation effects, improved treatment of heavy quarks, 
and higher-order corrections. The current analysis is carried out at NLO and achieves a satisfactory 
fit to all of the data sets over a wide kinematic range. In the future, an extension to NNLO will be 
possible but this must wait until more cross-section calculations are available at the higher order, 
and until the precision of the experimental measurements makes it useful. In the near future, new 
data from both the Tevatron and HERA, along with further theoretical development, will allow for 
even more precise determinations of PDF's, leading to more useful predictions for LHC physics. 
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A Details of the CTEQ6M fit 



The coefficients for the nonperturbative PDF's at Qo 
CTEQ6M fit are: 



1.3 GeV, as defined in Sec. 2.5, for the 







Ai 


A2 


^3 


^4 


^5 




1.4473 


0.6160 


4.9670 


-0.8408 


0.4031 


3.0000 


Uy 


1.7199 


0.5526 


2.9009 


-2.3502 


1.6123 


1.5917 


9 


30.4571 


0.5100 


2.3823 


4.3945 


2.3550 


-3.0000 


u + d 


0.0616 


-0.2990 


7.7170 


-0.5283 


4.7539 


0.6137 


s = s 


0.0123 


-0.2990 


7.7170 


-0.5283 


4.7539 


0.6137 


d/u 


33657.8 


4.2767 


14.8586 


17.0000 


8.6408 





The best fit to data is obtained with the following normalization factors for the experiments:]^ 



Expt.(e) 


BCDMS 


Hla 


Hlb 


ZEUS 


NMC 


CCFR 


E605 


D0 


CDF 


Norm(e) 


0.976 


1.010 


0.988 


0.997 


1.011 


1.020 


0.950 


0.974 


1.004 



Table shows the values for those experimental data sets for which detailed information on 
systematic errors is available and used in the CTEQ6M fit. For each data set (e), we show: the 
number of data points (A'^e), the value for that experiment in the CTEQ6M fit (Xe); ^"^^ Xe/-^e- 

Table 1: Comparison of the CTEQ6M fit to data with correlated systematic errors. 



data set 




xl 


Xl/Ne 


BCDMS p 


339 


377.6 


1 


114 


BCDMS d 


251 


279.7 


1 


114 


Hla 


104 


98.59 





948 


Hlb 


126 


129.1 


1 


024 


ZEUS 


229 


262.6 


1 


147 


NMC F2p 


201 


304.9 


1 


517 


NMC F2d/p 


123 


111.8 





909 


D0 jet 


90 


64.86 





721 


CDF jet 


33 


48.57 


1 


472 



These results form the basis for much of the quantitative uncertainty analysis discussed in this 
paper. The interpretation of the ZEUS and NMC x^'s are studied in some detail in Sec. B.3| . 



T7 



^'^Hla and Hlb refer to HI data sets for low-Q and high-Q measurements respectively 

^^The fairly large value of Xe/-^e for the CDF data has been studied extensively by the experimental group 
A substantial part of the excess is due to fluctuations of a few points at moderate pt- The best value of Xe/^( 
that can be achieved for this data set (using a high order polynomial curve fit to the data) is approximately 1.3. 
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Data sets with only effective uncorrelated errors are CCFR,|^ E605, E866, and CDF W-lepton 
asymmetry. Tlie nominal Xe I ^e. for these data sets for the CTEQ6M fit are (150/156, 95/119, 
6/15, 10/11) respectively. 



B Comparison of Theory and Data with Correlated Errors 

This Appendix consists of (i) a brief summary of the formalism used in our error analysis — both 
for the calculations in performing the global fits and for the interpretation of the results of 
those fits; (ii) an examination of the significance of the observed higher-than-nominal x^'s for two 
of the DIS experiments; (iii) a discussion of the estimated tolerance for our uncertainty estimates. 



Further details of the formalism can be found in Ref. |12| 



B.l Useful formulas for and the analysis of systematic errors 

The simplest function, used in most conventional PDF analyses, is 

2 (A — Tif 
^0=2^ 2^ ^72 — (6) 

expt. i=l * 

where Di is a data value, Tj is the corresponding theory value (which depends on the PDF model 
parameters {a}), and a'^ is the combined statistical and systematic errors (assumed uncorrelated 
and usually added in quadrature) on the measurement Di. This effective function provides a 
simple measure of goodness-of-fit, convenient for the search for candidate PDF sets by minimization. 
However, it is not useful for estimating the uncertainties associated with those candidates because 
it does not contain enough information to allow a meaningful statistical inference based on the 
increase in away from the minimum. 

Most DIS experiments now provide more detailed information on measurement errors. For 
each data point i, we have the statistical error Cj, uncorrelated systematic error Uj, and several 
(say, K) sources of correlated systematic errors {/3ij,/32i, • • • il^Ki}- The best fit to the data (i.e., 
the fit with least variance) comes from minimizing the x^ function,^ 



expt. 



fc=l / k=l 



(7) 



where a? = a? + is the combined uncorrelated error. The fitting parameters are (i) the PDF 
model parameters {a}, on which Tj depends, together with (ii) random parameters {r} associated 
with the sources of correlated systematic error. The point of Eq. (0) is that Di has a fiuctuation 
Z^fe '^kiiki due to systematics. The best estimate of this shift is obtained by minimizing x'^ with 
respect to the set {r^}. In practice, the total number of such parameters for all experiments 
included in the global analysis can be quite large. Adding these to the already large number of 
PDF parameters {a} (which represent the real goal of the analysis) one encounters a formidable 



See footnote |, Sec. 

The prime is to distinguish it from the its simphfied — but equivalent — form, , derived below in Eq. 
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minimization task, involving a parameter space of dimension close to 100. The practical difficulties 
have considerably hindered past efforts using this approach. The stability and reliability of results 
obtained this way can also be questioned. 

We pointed out in Ref. that the minimization of the function with respect to {r} can 
be carried out analytically. This simplifies the global analysis to its irreducible task of minimization 
with respect to the PDF parameters {a} only. In addition, the analytic method provides explicit 
formulas for the optimal values of {r^ , = 1 . . . K\ due to the systematic errors k = \ . . .K that 
are associated with the fit. These optimal shifts are obtained from the condition dy^ /dr^ = 0, and 
the result is 

K 

rk{{a}) = Y.{A-^),,,By . (8) 
fc'=i 

Here {B^} and {A^k'} are given by 

Bk{{a}) = 2 ^""^ ^kk' = Skk' + >^ n— ■ (9) 

1=1 ' 1=1 ' 

Substituting the best estimates (§) back into x''^, we obtain a simplified function. 



x'({a})^x"(W,WW)}) = E { 

expt. 



E^^^- E B'^i^-')kk'B^'\- (10) 



i=l k,k'=l 



Minimizing x^({o}) with respect to the PDF model parameters {a} is equivalent to minimizing 
x'^({ci}, {r}) with respect to both {a} and {r}. This procedure provides a much more streamlined 
way to obtain the best PDF fit. This formalism, quoted in the main text as Eq. (||), forms the 
basis of our global analysis. 

Assuming the measurements Di, ai and Pki are in accord with normal statistics, the 
function defined by Eq. ( p!0[ ) (or (|^)) should have a standard probabilistic interpretation for a 
chi-squared distribution with degrees of freedom, for each experiment. 

B.2 Useful tools for evaluating fits and interpreting results 

Even when the real experimental errors do not behave in the textbook manner, as is often the 
case, the formalism developed above provides useful tools for evaluating the quality of the fits 
and interpreting their results. It also provides hints on reasonable ways to deal with less-than- 
ideal cases. We describe some of these tools which are used in the presentation of our fit results 
throughout this paper, and in the error analyses to be discussed in subsequent sections. 

First, for any fit, Eq. (P) provides a best estimate of each of the systematic errors {rk,k = 
l...,K}, which we shall denote by {r^}. For each k, the parameter should be of order 1, 
because the probability for to be ^ 1 by a random fluctuation is small. If any turns out to 
be noticeably large, the estimate of the systematic errors {/3fcj} is suspect. 

Second, once the parameters that minimize the x"^ functions have been determined, we can 
streamline the comparison of the flt with data — thus evaluate the quality of the fit — by rewriting 
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Eqs. (0) and (0) as follows: 



x'^x'(W)=x"(W,{f}) = E 

expt. 



j=i fe=i 



(11) 



where {a} (like {r} above), are the PDF parameters {a} at the minimum, and 

Di = D,-J2rkPki, (12) 
fc=i 

are the data points adjusted by the systematic errors that give rise to the best fit. 

Equation (11) has the simple appearance of the naive Xo function (|6|); but it is precise. 
We note: (i) The systematic shifts of the data points associated with the fit are absorbed into 
{Di}. (ii) The denominator af consists of uncorrelated errors only, (iii) The additional (last) 
term on the right-hand side is just a "constant" when we compare the fit with data, since it is 
independent of the index i. When the theory curves (obtained from {Tj}) are compared to the 
adjusted data points {Di} to assess the goodness-of-fit, one can regard the uncorrelated errors a, 
as the only measurement error — clearly a great simplification because these errors are random. 
Figures prepared in this way will give a much more truthful picture of the quality of the fit than 
comparing theory directly with {Di}, since effects due to the unseen correlated systematic errors 
are impossible to visualize. 

B.3 Error analysis of DIS data 

The DIS experiments form the bedrock of global analysis of parton distributions. It is seen in Table 
1^ that the BCDMS and HI data are within normal statistical expectations, having /N ~ 1. The 
ZEUS data have a marginally larger-than-expected /N , of 267/229. For 229 data points, the 
probability that x^ > 267 is 0.063. The NMC data has a much larger xV^ of 305/201. This 
corresponds to a nominal probability of very small magnitude indeed — 3.1 x 10^^. (This fact was 
first observed in an early GKK study 1^^.) It is important to understand the reasons for these 
numbers and to determine whether the fit to these data sets is acceptable. In addition, in order to 
assess the uncertainty range of the global analysis, we will need to adopt some uniform procedure 



for evaluating probabilities among the experiments (cf. Sec. B.4). To both these ends, it is useful to 
look a little deeper into the x^ values for the individual experiments, utilizing the tools developed 
in the two preceding subsections. We use ZEUS as the first example, since its x^ value is at the 
boundary of being "normal" . 

ZEUS data on F2p(x, Q^): Figure | in Sec. pXT| shows the CTEQ6M F2p{x, Q^) (solid curves) 
compared to the ZEUS data (corrected by the systematic errors determined by the fit). As noted 
there, the agreement between theory and data looks quite good. In order to understand the larger- 
than-expected x^ value mentioned above, we need to look deeper. Some insight is provided by 
Fig. |l^. Part (a) of this figure shows a histogram of 

Di-Ti 

Ai = — (13) 
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for the ZEUS data. The curve is a Gaussian of width 1 with integral equal to = 229, the number 
of data points. This histogram is the so-called "pull distribution" — the difference between data 
and theory in units of the statistical error. 



Fig. 18 : (a) Histogram of Aj in Eq. ( |l^ ) for the ZEUS data. The curve is a Gaussian of width 1. 
(b) A similar comparison but without the corrections for systematic errors on the data points. 





-2 

(Di-Ti)M 



The experimental fluctuations seen in Fig. 18a appear (i) Gaussian, (ii) peaked near 0, and (iii) of 
the normal width. Considering that there are 10 different sources of systematic errors, this test of 
statistical consistency gives some confidence in the fit. 

Next, we note that the net shifts of the data points due to systematic errors are significant, 



but within the range expected by normal statistics. This is shown in Fig. 18b, which is similar 



to Fig. 18a but without correcting the data points for systematic errors. A displacement of < 0.5 
units in A is clearly seen, but that is not unreasonably large. A detailed account of the corrections 
{ri, f2, . . . , fi^} (with K = W for this ZEUS data set) is shown in the following table: 



k 


1 


2 


3 


4 


5 


6 7 


8 9 


10 


rk 


1.67 


-0.67 


-1.25 


-0.44 


-0.00 


-1.07 1.28 


0.62 -0.40 


0.21 



Assuming that each systematic error has a Gaussian distribution with the published standard 
deviation, the probability distribution of should be P(r) = e~''^/^/\/27r. None of the corrections 
listed in the table is far outside this distribution. We conclude that the global fit is consistent with 
the experimental systematic errors, and the corrections calculated from Eq. ^ are reasonable. 

NMC data on F2(x, Q^) from /xp scattering: The comparison of NMC data on F2{x,Q^) 
to the CTEQ6M fit has been shown in Fig. ^b. Sec. |3.1.1| . Data and theory appear to be in 
general agreement — there are no systematic patterns of deviation. However, there are clearly 
outlying points, showing large point-to-point fluctuations of individual data points around the 
smooth theory curves. To quantify this observation, we again examine the pull distribution for this 
data set, shown in Fig. |l^ in the same form as Fig. |l8| for the ZEUS data. Figure p!9| a shows that (i) 
the measurement fluctuations appear fairly Gaussian, (ii) the distribution is peaked around A = 0, 
but (iii) the width of the actual histogram appears broader than the normal distribution — there 
is an excess in outlying (large fluctuation) points, and a corresponding depletion in central (small 
fluctuation) points. 
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Fig. 19 : 
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(a) Histogram of Aj in Eq. ( p!3| ) for the NMC data, (b) A similar comparison but 
without the corrections for systematic errors on the data points. 
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Given the larger-than-normal fluctuations, how do the systematics of the fit measure up? 



Comparing Fig. 19a and Fig. |19|b, which shows the fluctuations of the uncorrected data, indicates 
that the net shift in A due to systematic errors is ~ 0.3, which is quite reasonable. The separate 
shifts associated with the 11 distinct systematic errors for this flt are: 



k 


1 


2 


3 


4 


5 


6 


7 


8 9 


10 


11 




0.67 


-0.81 


-0.35 


0.25 


0.05 


0.70 


-0.31 


1.05 0.61 


0.26 


0.22 



We see that the individual deviations are ah reasonable, just like the case for ZEUS. 

The size of the fluctuations of the data points has very little to do with the viability of the 
theory model — the excess due to such fluctuations cannot be reduced by any improvement in the 
theory which produces smooth predictions. The quality of the fit is tied much more closely with the 
systematics, which appear quite normal in the above comparison. As observed before, in Sec. p.l.l] , 
the most noticeable fiuctuations for the NMC data — points with almost the same (x, Q) values — are 
from data sets taken at different incoming energies. (They can be resolved experimentally, as in 



the combined data set |21] which we do not use.) For these reasons, we consider the fit to the NMC 
data to be acceptable. 

This example vividly illustrates the usefulness of the method of error analysis adopted here, 
compared to the traditional one based on the covariance matrix. Equation ( [lT|) and the plots based 
on it that we have shown allow a separate examination of the contributions to from correlated 
and uncorrelated errors, which is useful for a more informed assessment of the quality of the fit. The 
traditional approach, dealing only with the overall x^i can be rather limiting when the experimental 
errors do not conform to the ideal distributions. 



B.4 The tolerance criterion 

In this section, we describe the procedure to calculate the range of uncertainty for the parton 
distributions. In the Hessian approach we adopt, this range is characterized by an overall tolerance 
parameter, T, that specifies the acceptable neighborhood around the global x^ minimum in the 
parton parameter space by the condition Ax^ < T^. Uncertainties of the PDF's and their physical 
predictions are all linearly proportional to T. We arrive at a quantitative estimate of T by examining 
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the range of overall along each of the eigenvector directions within which a good fit to all data 
sets can be obtained, and then "averaging" the ranges over the 20 eigenvector directions. The 
range of acceptable fits along a given direction is estimated by combining the constraints placed on 
acceptable fits by each individual experiment included in the fit, as described below. 

A key feature of this method (which makes the entire approach practical and reliable) is the 
use of an orthonormal basis in the parton parameter space. An important consequence is that 
the constant hypersurfaces are simple spheroids. It comes as no surprise, then, that when we 
carry out the mapping of the allowed ranges along the different eigenvector directions, we find that 
they have the same order of magnitude, so that the averaging of the results to obtain an overall T 
estimate makes sense. For this reason, we do not need to show the details for all 20 eigenvector 
directions. 

Let us consider the direction of Eigenvector 4 as an example. Consider points along this 
direction in the neighborhood of the global minimum, labelled by D, the distance from the mini- 
mum. These points are candidate fits. We first evaluate the acceptable ranges of fits with respect 
to the individual experiments, according to the known experimental uncertainties. For experiment 
(e), the individual function, Xg; is a quadratic function of D with minimum at some value Z)e- 
Following Ref. |0] (Sec. 4.1), we define the range of fits acceptable with respect to experiment (e) 
by finding the upper and lower bounds of D — denoted by — using the criterion 



P{x\N,) dx' 



0.9 



(14) 



where P{'x^,N(.) is the standard x'^-distribution for Nf. data points. The ranges for the individual 
experiments obtained this way, each shown as a line with the minimum of Xei^) marked by a dot. 



are displayed in Fig. 20 



Fig. 20 : Uncertainty ranges (vertical lines) for the input experiments along the Eigenvector 4 
direction. The horizontal lines indicate the tolerance T, as discussed in the text. 
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For Eigenvector 4 we see that the strongest bounds on D come from the data sets Hla (low Q 
HI data) and BCDMSd (BCDMS deuteron target) on the negative D side, and from the data sets 
CCFRF3 (CCFR measurements of F3) and ZEUS on the positive D side. 



A practical problem must be resolved in producing the results shown in Fig. |20|: how does one 
obtain meaningful, and reasonably uniform, estimates of the acceptable ranges for the various ex- 
periments when the values of x^/^ vary considerably among the data sets — and in both directions, 
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above and below the canonical value of 1.0? (Cf. Table Q.) In the preceding subsection, we consid- 
ered some experiments with larger-than-normal values of /N . The investigations there suggest 
that the deviations from normal statistical expectations can be attributed mainly to unexplained 
fluctuations. In order to obtain the acceptable ranges for the individual experiments in a uniform 
way, it is reasonable to calculate the bounds for experiment e using the renormalized variable 
Xe/[Xe(0)/-^e], assumiug that it obeys a chi-squared distribution with A'^e degrees of freedom. Here 
Xe(0) is the of experiment e for the standard fit Sq. This procedure provides a pragmatic and 
uniform way to deal with the problems of acceptability and compatibility among the experimental 
input encountered in the global analysis. It is often used in other situations involving reconciling 
data from different experiments p^]. 

Eigenvector 4 was chosen for illustration purposes. The other cases are similar. Figure 21 
shows results for the uncertainty ranges of the experiments along a different direction in the parton 
parameter space — that of Eigenvector 18. In this case the strongest bounds on the displacement 
come from the DY and jet experiments. 



Fig. 21 : Same as Fig. 20, except for Eigenvector 18. 

Eigenvector 18 



The next problem is to convert the individual ranges in Fig. 2C or Fig. 21 into a single 
uncertainty measure. There is no unique way to do this, because the individual ranges shown in 
these plots are not statistically independent — the candidate fits along the eigenvector direction are 
all obtained by global fits to the full set of experimental input. In practice, we estimate the overall 
acceptable range along each eigenvector direction by using the two most restrictive experimental 
constraints on either side of the minimum. In this way, we obtain the bounds \D\ ~ 10 for both 
Eigenvector 4 and Eigenvector 18, indicated by the horizontal lines in Fig. 2C and Fig. 21. Note 
that these bounds come from different experiments, since the eigenvector directions are sensitive 
to different features of the parton distributions. 

The above procedure for estimating the uncertainty of D was previously used in |12| to 
estimate the uncertainties of specific physical predictions (such as cross sections at the Tevatron 
and LHC colliders), using the Lagrange multiplier method. Each physical variable corresponds to 
a definite direction in the parton parameter space. The two examples shown above, along two 
eigenvector directions, are typical for all the directions. Thus, an overall estimate for the tolerance 
of T = 10 follows, as an "average" over all directions in the parton parameter space. This T 
corresponds to a range of variation of the global of Ax^ = = 100. 
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Representing the uncertainties of PDF's and their physical predictions by one single number 
T is clearly an oversimplification; it can only be approximate. But given the complexity of the 
problem, it should be equally clear that attempting to be more precise than this would be rather 
unrealistic at this stage. In addition to the practical (hence imprecise) measures that need to 
be adopted to deal with the diverse experimental data sets with nonuniform error specifications, 
there are additional theoretical uncertainties not yet included in the analysis because they are not 
easily quantified. In spite of these shortcomings, however, this method of estimating uncertainties 
is far more systematic and quantifiable than the ad hoc procedures that have been used in the 
past. The important point is that this method, based on the established Hessian formalism, is 
fundamentally sound; its current limitations are due to compromises in implementation necessitated 
by experimental realities. The implementation can be systematically improved as both experimental 
and theoretical input improve with time. 

As already noted, the estimated tolerance of T = 10 contains experimental uncertainties only. 
Uncertainties of theoretical or phenomenological origin are not included because they are difficult 
to quantify. They might be significant. For instance, we have seen throughout this paper that the 
parametrization of nonperturbative PDF's has a big influence on the results. Therefore in physical 
applications the criterion T = 10 must be used with awareness of its limitations. 



C study of Higher Twist Effects 

This appendix addresses the question whether it necessary to include higher-twist contributions in 
the global analysis. It provides the background study that leads to the conclusion mentioned in 



Sec. |3.3.2| , that, with the kinematic cuts to data included in our global analysis, higher-twist terms 
are not needed. 

There have been several studies of the effects of higher-twist (HT) contributions, or power- 
law corrections, to i*2(x,Q^) |§,|5|, These usually place some emphasis on describing data 
in the small-Q range (say, 1 GeV < Q < 2 GeV) where higher-twist effects are expected to be 
noticeable. In the global analysis of PDF's, the emphasis is different: in order to reliably extract 
the universal parton distributions, it is desirable to focus on the twist-2 sector of PQCD, without 
the complications of any process-dependent (and model-dependent) effects, such as HT. Thus, we 
adopt kinematic cuts of Q > 2 GeV and > 3.5 GeV to minimize HT effects. The question then 
becomes whether these kinematic cuts are sufficient to render HT effects numerically insignificant .p^ 
This question can only be answered phenomenologically by investigating whether the inclusion of 
HT terms in the theory model is needed to achieve a satisfactory fit, and whether the added degrees 
of freedom lead to a statistically significant improvement in the fit. 

To perform this study, we compare the results of our standard fits with those including 
HT effects. The model adopted for the higher- twist term is similar to that used in the literature 



F2{x,Q') = ^^(x,Q^) ( 1 + ) (15) 



^"One would like to keep the kinematic cuts as low as is practical, in order to include as much high-statistics data 
as possible (at relatively low Q) in the global analysis. 
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where 



H(x) 



ho + hix + h2X^ + h^x^ + /i4X^. 



(16) 



The five parameters {/iq, . . . , /14} are determmed by minimizing x^, along with the other model 
parameters. For simplicity, and following the earlier higher- twist studies, we have assumed that 
{/lo, • • • , /i4} are the same for all DIS processes. 

The table below shows the results of our higher-twist study. The first column is the standard 
fit CTEQ6M, with no higher twist, i.e., = 0. The second column is the best fit with higher-twist 
corrections. The global is slightly lower with the higher-twist correction (as it must be, since 
there are more parameters in this fit), but the reduction Ax^ = —23 (with 5 extra parameters) 
does not represent a real improvement of the fit (considering our adopted tolerance). The table 
also lists the values of /N for individual DIS experiments. 





CTEQ6M 


Higher-twist fit 




1954 


1931 






BCDMS proton 


1.11 


1.07 


BCDMS deuteron 


1.11 


1.03 


HI A 


0.95 


0.94 


HI B 


1.02 


1.02 


ZEUS 


1.15 


1.15 


NMC 


1.52 


1.50 



Note that the reduction of from inclusion of higher twist, comes mainly from an improved 
fit to the BCDMS data on /xp and /xd DIS. The HERA experiments are not affected by the higher- 
twist correction, as one would expect. The NMC experiment, which, like BCDMS, has data points 
at low values of Q, is fit only slightly better by including the higher-twist factor H(x). Figure ^ is 
a graph of the optimal function H{x). It is qualitatively similar to that found in previous studies. 

Fig. 22 : The function H{x) for the phenomenological higher-twist correction to DIS. 
H{x) 

0.2 r 

0.1 




Because the NMC data has a large P^r point compared to other DIS experiments, we 
have also studied whether a higher- twist function H^mq{x) can be found that would significantly 
reduce the value of /N for the NMC experiment. The best fit for the NMC data, keeping the 
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PDF's unchanged but optimizing the higher-twist correction to the NMC data, has /N = 1.37. 
The associated function H^mc{x) is somewhat different from the best global H{x) — more strongly 
negative for x < 0.5. However, this "best fit" for NMC causes substantial increase in the x^'s 
of the other DIS experiments — the total increases (from 1954 for CTEQ6M) to 2232. Thus, 
introducing HT contributions does not solve the apparent problem of larger-than-normal foi' the 
NMC experiment in the global fit. This is not surprising, considering our relatively large cut in the 
minimum value of = A GeV^. 



D Study of Parametrizations 



We have noted in Sees. 2.5, |^ and 5.2 that parametrization of the nonperturbative QCD PDF's 



now has an important bearing on the results of the global analysis, given the much improved 
experimental constraints and the newly developed theoretical methods. We have rather extensively 
explored the influence of parametrization, using the functional forms described in Sec. |2.5| and the 
iterative Hessian eigenvector method for matching the degrees of freedom in the parametrization 
with applicable experimental constraints (cf. Sec. ^^ )- 

Even with tools like these, the choice of parametrization still ultimately involves subjective 
(i.e., physical) judgements. In this appendix, we shall give only one example of the studies we 
have carried out, relating to the behavior of the gluon distribution at both small and large x, as 



discussed in Sec. 3.2.1 and Sec. 5.2. 



This study was motivated by an attempt to understand the results of CTEQ6 fits described 
in Sec. ^ in light of two issues raised by |jl^: (i) Do the recent HERA data imply a negative 
gluon distribution at small x at the scale Q = 1 GeV? (ii) Do good fits to the new D0 jet data 
necessitate an artificial-looking humped structure of the gluon at large x at low Q? The seeming 



contradiction between the observations of [13|, which give rise to these questions, and the apparent 



good all-around CTEQ6 fits is resolved by this exercise. 



First, the observations of [13| were confirmed when we performed a global fit using the 
MRST2001 functional form for the gluon, but keeping our parametrization of the quark degrees of 
freedom, and fitting to our full set of data with our definition (|To|) for the global x^- Figure p^ a 
shows the resulting gluon distribution at Q = 1, 2, 5, 100 GeV. At the scale Q = 1 GeV, both the 
negative gluon at small x and the humped structure at large x found by the least-x^ MRST2001J 
fits are reproduced. For comparison. Figure p^ b shows the same distributions from a fit using our 
parametrization. (To make the comparison possible, this fit was done with Qq = 1 GeV in place 
of the CTEQ6 value Qq = 1.3 GeV.) In this case, there is only a slight shoulder at large x for 
Q = 1 GeV, and the distribution has become completely smooth hy Q = 2 GeV. Hence the hump 
structure seen by MRST is an artifact of the particular choice of the parametrization. The gluon 
distribution is constrained to be positive definite in this parametrization. These two fits differ in 
overall by an amount that is less than our tolerance estimate, so we do not find convincing 
evidence for a negative gluon distribution even at the very low scale of Q = 1 GeV, where the 
parton distributions are ambiguous anyway because of higher-order and higher-twist corrections. 
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Fig. 23 : (a) The gluon distribution at Q = 1, 2, 5, 100 GeV obtained from a global fit using the 
MRST functional form for the initial gluon; (b) for comparison, the same gluon distributions from 

the CTEQ6 parametrization with Qq = 1 GeV. 
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Since there is no firm theoretical requirement for a positive-definite gluon distribution at Q 
as low as 1 GeV, we also tried an alternative parametrization for the gluon in which the CTEQ6 
functional form is multiplied by an additional factor {\^ A^j x) that allows Qo) to go negative 
at small x. The resulting gluon distributions are shown in Fig. 



Fig. 24 : (a) The gluon distribution at Q = 1, 2, 5, 100 GeV obtained from a global fit in our 
parametrization, but allowing for negative gluon at small-x; and (b) Gluon uncertainty band at 
Q = 1 GeV, covering both + and - regions. Dashed:CTEQ5, dotted:MRST2001. 
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This fit has one more fitting parameter, and hence it results in a slightly lower overall than 
CTEQ6M, but the reduction is well within our tolerance range. An interesting feature of this fit 



is, of course, that it does become negative at Q = 1 GeV. But, like the case in Fig. 23a, the 
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distribution rapidly becomes positive under QCD evolution. No trace of the negative region is seen 
at Q = 2 GeV. Is the seeming preference for a negative gluon at Q = 1 GeV physically significant? 
We can answer this question quantitatively by mapping out the range of uncertainty of the gluon 
distribution at Q = 1 GeV using the Hessian method. The result is shown in Fig. 24 b. The range 
of uncertainty, given current experimental constraints, is very large at the Q = 1 GeV scale, and 
it covers both positive and negative territories at small x, as well as large x\ Due to the nature of 
QCD evolution, the uncertainty decreases rapidly with increasing Q, as shown in Fig. Eq. 



Fig. 25 : Gluon uncertainty band at Q 



2 GeV. Same format as Fig. 24 
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